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° Co-Founded Anaconda w/ Travis s 
Oliphant (creator of Numpy & Scipy) N Py D a ta 


e CEO (former CTO) 


° Founded PyData community & confs 
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OPEN CODE = = BETTER SCIENCE 


° Pythonista for ~24 yrs 


° Open Source advocate for ~30 yrs 
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Open Source & 


Primer: Industrial Capitalism 101 


Industrial-era thinking separates labor from owner of "capital 
equipment”, i.e. the means of production (and scale) 


by 
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LABOR POWER) 


€ ® D + < © a A enwikipedia.org Č u ch aa 
XA 147 languages v 


Article Talk Tools v 


Capitalism is an economic system based on the private ownership of 
the means of production and their operation for profit. HABIA Central 
characteristics of capitalism include capital accumulation, competitive 
markets, price systems, private property, property rights recognition, 


voluntary exchange, and wage labor.Ël[6] In a market economy, 


decision-making and investments are determined by owners of wealth, 
property, or ability to maneuver capital or production ability in capital 
and financial markets—whereas prices and the distribution of goods 
and services are mainly determined by competition in goods and 
services markets.!7] 


Primer: Industrial Capitalism 101 


“Intellectual Property” was applied to Software 
in the ~1960s-70s 


I.C. Software Under the 1976 Act 
The Copyright Act of 1976, which became effective on January 1, 1978, 


made it clear that Congress intended software to be copyrightable. The 
definition of literary works in Section 101 states that they are: 


As Property, its ownership could be 
transferred to someone else. By convention 
and as standard, ALL modern employers 
require transfer of IP from white-collar 
workers. 


works, other than audiovisual works, expressed in words, 
numbers, or other verbal or numerical symbols or indicia, 
regardless of the nature of the material objects, such as books, 
periodicals, manuscripts, phonorecords, film, tapes, disks, or 
cards, in which they are embodied. {FN7: 17 U.S.C. §101} 

Furthermore, the House Report discussing the Act states: 

The term “literary works” does not connote any criterion 
of literary merit or qualitative value: it includes catalogs, 
directories, and similar factual, reference, or instructional 
works and compilations of data. It also includes computer data 
bases, and computer programs to the extent that they 
incorporate authorship in the programmer’s expression of 
original ideas, as distinguished from the ideas themselves. 
{FN8: H.R. Rep. No. 94-1476 at 54} 


Open Source Proves Software Is Un-Property 
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Usufruct 


From Wikipedia, the free encyclopedia 


Usufruc nkt/}llis a 
limited real right (or in rem 
right) found in civil-law and 


mixed jurisdictions that unites 
the two property interests of 
usus and fructus: 


e Usus (use) is the right to 
use or enjoy a thing 
possessed, directly and 

ithout altering it. 
fruit, in a figurative 
e) is the right to derive 
profit from a thing 
possessed: for instance, by 
selling crops, leasing 
immovables or annexed 
movables, taxing for entry, 
and so on. 


A usufruct is either granted in 
severalty or held in common 
ownership, as long as the 


se control 


Open Source Software is a special 
type of un-property 

Confuses most economic engines 
Intrinsically anti-rivalrous 


Sharing increases value 
Forking decreases value 


Should be approached with 
abundance mentality 


OSS Communities Are Unique Human Ecologies 


e Participatory cultural activity vs. 
“taking free stuff”. Fruit vs. tree. 


e Participation gives you a voice and 
preserves your agency. 


e To accept other people into the 
participation culture, requires 
developing trust with them 


e It's hard to trust corporate brands. 
Trust is inherently tied to individuals. 
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“Intellectual Property" was applied to Software l.C. Software Under the 1976 Act 
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works, other than audiovisual works, expressed in words, 
numbers, or other verbal or numerical symbols or indicia, 
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wor k ers bases, and computer programs to the extent that they 


incorporate authorship in the programmer’s expression of 
original ideas, as distinguished from the ideas themselves. 
{FN8: H.R. Rep. No. 94-1476 at 54} 


Stakeholder economics and social equity in 
Software Development primarily persists in 
OSS human ecologies (“communities”). 
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The Economics of Al 
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The World is Changing: We Are at the Dawn of Al 


e The global tech industry is 4% of population, creating a $5 Trillion industry. 
e Alempowers the other 8 BILLION humans to harness computing, on their own terms. 


e This will redefine how people understand, interact with, and predict the world. 


2023 


AI Will Reset the Entire Tech Value Chain 


Data, hardware, infrastructure software... We can finally simplify the building & 


were all just components delivery of complete predictive solutions 
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“The Idea Propagation Value Chain" 


Lreation—Substontiation- Duplication- Distribution! | Concumption 


“Writing unbundled Consumption” 


= s : “Printing Press unbundled 
Lreation—Substortiation [Duplication - Distribution STEAM duplication, and Internet 
bottleneck 


unblocked distribution” 


Lreation—Substortiation) Duplication Distribution Covewmgtion “Al removes the final bottleneck” 


bottleneck. 


https://stratechery.com/2022/the-ai-unbundling/ 


Industrial 
Ownership & 
Control 


Machines & inventions 
belong to, and are 
controlled by, a single firm. 


Open-Source 
Ownership & 
Control 


Software / Code Base Ë 


Open-source codebase is shared across 
private firms (who may also have 
private codebases) AND an unofficial 
community of developers. 


Al 


Al brings at least two 


Ownership & novel features to the 
Control? ownership-and-control 
guestion: 


(1) The line blurs between 


data and software — 


which is shared, to 
(2) The general public are, 


? ? 


of data — whether 
passively or actively. 
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Everyone Can Build... But Does Everyone Profit? 


Commercialization of LLMs will require solving attribution problem 


... and that will not rest solely on the question of Copyright vs 
Fair Use. 


Could we develop a new kind of holistic licensing model? 


e For data 
e For code 
e For human training efforts 
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A New License for Al? 
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A New License for Al? 


All existing Copyright & IP law around (code, data) are based on historical 
concepts from mechanical reproduction... 


... but LLMs are not mere mechanical reproducers 


A New License for Al? 
All existing Copyright & IP law around (code, data) are based on historical 
concepts from mechanical reproduction... 
... but LLMs are not mere mechanical reproducers 
Powerful industries are pitched into heated battle... 


... and the stakes will determine the future of their entire industry. 


A New License for Al? 
All existing Copyright & IP law around (code, data) are based on historical 
concepts from mechanical reproduction... 

... but LLMs are not mere mechanical reproducers 
Powerful industries are pitched into heated battle... 

... and the stakes will determine the future of their entire industry. 
LLMs are surprisingly powerful but also surprisingly portable & easy to use 


... meaning that “rogue” usage will be rampant 
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1. Legal Issues 
THE WALL STREET JOURNAL. 


English Edition v Print Edition Video Audi les More v 


T h e re IS | e 9 a | U na C e rta j na ty a rO U na d Home World U.S. Politics Economy Business Tech Markets Opinion Books&Arts RealEstate Life&Work Style Sports 
the use of LLMS. 


Publishers Prepare for Showdown With Microsoft 
¢ Training data Google Over AI Tools 


Media executives want compensation for use of their content in ChatGPT, Bing and Bard 


° Are LLM weights derivative 


(a Alex J. Champandard `Y alexjc@creative.ai @ 
E r) 


HuggingFace is right at the center of a data laundering controversy. It's 
W 0) r K S 7 F the hub of the ML industry's Copyright fraud activity! (See link for 
context.) 
. H = Benjamin BLM r i vac y email address over a month ago 
© Are training Corpora themselves G) This video cuts through the noise and says in no uncertain terms: 's from them. 
. | m X MM. oe as. n li 
copyrightable? 1.@ The current Machine Learning image generators are entirely based on | n a s ÇO a lla WN x ê att a 14 
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° Etc 2. @"AI" is a misnomer, pure marketing. b Pewa id Me E S TN: i 
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NEW VIDEO: 
Al art is the subject en vogue among the creatives of the internet, with 
debates over everything from whether it counts as art to whether it 

counts as theft 


4: How Academic and Nonprofit Researchers Shiel... 
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Not Just About Copyright! MT 
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FEDERAL RIGHT OF PUBLICITY TAKES CENTER STAGE IN SENATE 
HEARING ON Al 


Both senators and witnesses spent significant time advocating for new legislation— 


a federal right of publicity or a federal anti-impersonation right (what one witness 
dubbed the FAIR Act). Discussion of such a federal law occupied more of the 
hearing than predicted and significantly more time than was spent parsing either 


existing copyright law or suggesting changes to copyright law. 


Open? 


Is this a seismic shift for the Open Source Definition? 


OSI dives deep into the topics shaping the future of open source business, ethics and practice 
with a new kind of event. We'll help OSI stakeholders frame a conversation to discover what's 


acceptable for Al systems to be “Open Source.” 


DEFINING 
OPEN SOURCE 
AI 


open source 


initiative? 
The time has come to have a 
clear definition 


We're driving a multi-stakeholder 
= f EE I process to define an “Open 
D = Î>Î>Î S“ Source Al" 


What is “Open” in context of Al? 


Software used to create Al model V4 


Model itself V4 Aner 


initiative” 


Data used to train the model ? 


2. Until legal uncertainty is resolved, companies are 
operating in q gray zone 


MPT-7B 
RedPajama-INCITE 
StableVicuna 
Cerebras-GPT 
GPT4AII 

Dolly 

Bard 

Falcon LLM 
Alpaca 

Pythia 
ChatGPT 3 


2023/05/10 
2023/05/05 
2023/05/05 
2023/04/28 
2023/03/28 
2023/03/26 
2023/03/24 
2023/03/21 
2023/03/15 
2023/03/13 
2023/02/13 


= Initial Releas.— Developer 


Google 

MosaicML 

Together 

Stability Al 

Cerebras 

Nomic Al 

Databricks 

Google 

Technology Innovation Institute (TII) 
Stanford 

EleutherAl, Together 
OpenAl 


= License 


= Copyright Traffic-L Y 


Commercial Unknown v 
Apache 2.0 
Apache 2.0 
Noncommercial 
Apache 2.0 
Varies 

MIT 
Commercial 
Apache 2.0 
Noncommercial 
Apache 2.0 


Commercial 


Unknown 


3. However the lawsuits get decided/appealed, we 
should expect divergences across jurisdictions globally 


° Some companies 8 devs will YOLO 
° Some countries will YOLO 
° Some countries will banhammer 


Jurisdiction-shopping is already happening. 
The stakes are extremely high. 


4. This issue cannot be resolved in the space of copyright, 
because Deep Learning models are not “copiers” 


They are not “printers”. They are un-printers. 
They are not “cameras”. They are un-cameras. 


In the words of John Perry Barlow, they are able to extract wine from 
any bottle. LLMs destroy the entire foundations of a copyright-based 
Intellectual Property economy. 


o. While Big Tech and Big Copyright are locked in 
pitched battle, there is an open space © opportunity to 
build a new path. 


a2 NEE AAS 
NSS 
— S 
LYS AE a 
2 PI 
Pi 4 Z 
YD Ge 
(AI 
/ 


ONE ES 


FV Uf, 
A LA 
Z A f SY 


“TN 
Q 
— 
= 


AMPL 
(working t 
(OD) content ronan. O 2023 naroda 


e 


AMPL — Anaconda ML Public License 


e A family of licenses (a la Creative Commons) that extend the principles of Open 
Source to an Al era. 


e Empower creators to assert their rights to license their content for machine learning 
(ML) purposes, going beyond the limitations of copyright law. 


e Creators can assert this novel kind of provision as a bespoke license addendum 
to any existing copyrighted work, even prior to any established legislation or 
legal precedent 


e Simple user experience: similar to a copyright attestation, and includes a link to the 
relevant license for the specific content 


e A core aspect of the attestation/assertion is explicit affirmation of human authorship, 
and can provide a guidelines on how the human author wishes their work to be used 
vis-a-vis LLMs and Al ` 


Flexible Remunerqtion Can Be Built-In 


° Authors can grant permission for free-use for research, education, etc. 
° AMPL license family has opt-in, non-centralized Commons-oriented equity and 
compensation model: 
— metadata fields for attribution, monetization, and other relevant information 


— metadata should be easily machine-parseable, facilitating the processing and auditing of 
large volumes of work 


— licenses with commercial considerations will (by default) stipulate a small portion of 
payment to a non-profit organization that operates the infrastructure for the overall system 
and supports a global legal defense fund 


- All default models include default time-based expiration to public/free/Commons 


AMPL Family of Licenses (Examples) 


AMPL-0 - Corpora unrestricted but derivative models must use a compalible license 
AMPL-1 - Corpora contributors agree on fixed residuals from model revenue 


AMPL-2 - Corpora contributors agree on attribute-based residual from model 
revenue 


AMPL-3 - Proprietary corpora are utilized for proprietary model productions 


Typically, license levels 1, 2, or 3 will build on AMPL-0 foundational models while 
tracking provenance trainino/fine-tuning. AMPL-3 may build on AMPL-1 or AMPL-2 
models. 
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NFTs & Micropqyments 


“Don't call it web3, please” 


° AMPL motivates the creation of an NFT (Non-Fungible Token) of every 
piece of content, allowing creators to attach new licensing terms for Al 
purposes. 


e Multiple economic models are available for creators to choose from, 
motivating the use of decentralized currency and exchange systems to 
automate credit allocation. 


° These models range from free commercial use to various pricing structures 
based on revenue thresholds, equity surcharges, or fixed returns. 


Data Unions, Libraries, & Cooperatives 


Although the core idea is that anyone can attach an AMPL license declaration to 
any piece of content, the reality is that the vast majority of AMPL-licensed 
content will be handled through aggregators or curators of “data libraries”. 


Each dataset will likely have uniform permissions around commercial usage, 
licensing costs, etc. Simple market dynamics will motivate creators to form data 
unions, libraries, and/or cooperatives similar to the ASCAP. 


These “data libraries” will play a crucial role in the movement by incentivizing 
individuals to collectively curate well-organized content and charge for access to 
their collections. 
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What Is A Possible Outcome If We Do Nothing? 


If we do not create a community-oriented, decentralized, multi-stakeholder 
system for adjudicating data rights around Al, then: 


e Big Tech will create a private settlement with existing major copyright 
holders 


e Together they will form a de facto licensing cartel (“consortium”) that 
gatekeeps who can and cannot participate in Al innovation 


e Lobby politicians in every country and influence international trade orgs 
to strike down legal challenges to their commercial arrangements 


What Is A Possible Outcome? 
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In September 2005, three authors as well as the Authors Guild of America filed a class action 
lawsuit against Google and Stanford, Harvard, and the University of Michigan libraries over the 
Authors Guild, Inc. v. Google, Inc. % !arguage ~ Google Print project, citing "massive copyright infringement".!“] The complaint asserted that 
— en Google had not sought approval to make scans of the copyrighted books, and asked for an 
injunction to stop Google from scanning any copyrighted works during the lawsuit./] Google 


Authore Cure Inc: v Google, Inc: countered that its project represented a fair use and is the digital age equivalent of a card catalog 


From Wikipedia, the free encyclopedia 


Authors Guild v. Google 721 F.3d 132 (2d Cir. 
2015) was a copyright case heard in the United 
States District Court for the Southern District of New Z 
York, and on appeal to the United States Court of £ 

Appeals for the Second Circuit between 2005 and 4 
2015. The case concerned fair use in copyright law A ul A 
and the transformation of printed copyrighted books W La III 4 | 

EE EE AE «e S In the US, several organizations who took no part of the settlement, such as the American Society 
and digitization. The case centered on the legality of á 


the Google Book Search (originally named as Google mad of Journalists and Authors, criticized the settlement fundamentally.!"5) Moreover, the New York 
Print) Library Partner project that had been launched United States District Court for the 


in 2003. Southem District of New York, book settlement was not restricted to U.S. authors, but relevant to authors of the whole world. This 


United States Court of Appeals for 


Though there was general agreement that Google's the Second Circuit led to objections even on the level of some European governments and critical voices in many 
attempt to digitize books through scanning and Full case The Authors Guild Inc., et al. v. [16] N he 
computer-aided recognition for searching online was name Google, Inc. European newspapers.!““! The estate of John Steinbeck argued for and was granted an additional 


j i ji Decided October 16, 2015 (2d Circuit); . A . . . . . 
EE on Sê ran oe (SENI four-month extension for the class to file objections, putting the deadline into October 2009 and 
[12] 


Google had not sought their permission to make Ho with Judge Chin expected to evaluate the settlement in November. 
scans of the books still under copyright and offered Court membership 

them to users. Two separate lawsuits, including one Judge(s) Denny Chin (SDNY); Pierre N 

from three authors represented by the Authors Guild | Sting EL Hee 

and another by Association of American Publishers, 7 i 


Settlement criticisms [edt] 


What Is A Possible Outcome? 


& enwikipedia.org 


— WIKIPEDIA 


The Free Encyclopedia 


iz Authors Guild, Inc. u. Google, Inc. 


Article 
From Wikipedia, the free encyclopedia 


Authors Guild v. Google 721 F.3d 132 (2d Cir. 
2015) was a case heard in the 


„ and on appeal to the ates 
between 2005 and 
2015. The case concerned in copyright law 
and the tre of printed copyrighted books 
into an online searchable database through scanning 
and digitization. The case centered on the legality of 
the ( (originally named as Google 
Print) Library Partner project that had been launched 
in 2003. 


Though there was general agreement that Google's 
attempt to digitize books through scanning and 
computer-aided recognition for searching online was 
seen as a transformative step for libraries, many 
authors and publishers had expressed concern that 
Google had not sought their permission to make 
scans of the books still under copyright and offered 
them to users. Two separate lawsuits, including one 
from three authors represented by the 

and another by 


Full case The Authors Guild Inc., et al. v. 
name Google, Inc. 
Decided October 16, 2015 (2d Circuit): 
November 14, 2013 (SDNY) 

Citation(s) 804 F.3d 202 

Court membership 
Judge(s) n n (SDNY); r 
sitting h 5 r 


American author announced on her website her resignation from the 
Authors' Guild over the settlement, claiming the leadership of the Guild had "sold us 
[its members] down the river" and that the settlement threatened "the whole concept of 
copyright."!<““! She launched a petition against the settlement, which was signed by 
almost 300 authors.!“ 


In late 2013, after the class action status was challenged, the District 
Court granted ıt in favor of Google, dismissing the 
lawsuit and affirming the Google Books project met all legal 
requirements for . The Second Circuit Court of Appeal upheld 
the District Court's summary judgement in October 2015, ruling 
Google's "project provides a public service without violating intellectual 
property law." '! The subsequently denied a 


petition to hear the case.” 
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The community that built the 
code, won't be invited to govern 
the golems. 


Even today, we have created 
towering stacks of complex 
software & algorithms: a 
technological caste system. 


In the face of all these and a global 


different approaches... patchwork of laws ... 
YOLO / Arms Race Copyright 

“Ethical Al” Privacy 

“Responsible Al” Rights of Publicity 

“Data Dignity” Trademark 

Equitable Al Moral Rights 

“Open” / Open-source ... Consumer Protection 


Trade Secret... 


how to begin shaping the Al future we want? 
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How to begin shaping the Al future we want: 


e Don't wait for the legal and regulatory questions play out. 
e But also: Don’t go YOLO. Lay out a values-based approach. 
e Build practical, win-win-win Al solutions for all stakeholders 
e Adapt / evolve open-source tradition (whatever we call it). 


e Form q community with like-minded players. 


